A Paradigmatic Regression Algorithm for Gene Selection Problems
نویسندگان
چکیده
Motivation: Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. The problem of assigning tumours to a known class is a particularly important example that has received considerable attention in the last ten years. Many of the classification methods proposed recently require some form of dimension-reduction of the problem. These methods provide a single model as an output and, in most cases, rely on the likelihood function in order to achieve variable selection. Results: We propose a prediction-based objective function that can be tailored to the requirements of practitioners and can be used to assess and interpret a given problem. The direct optimization of such a function can be very difficult because the problem is potentially discontinuous and nonconvex. We therefore propose a general procedure for variable selection that resembles importance sampling to explore the feature space. Our proposal compares favorably with competing alternatives when applied to two cancer data sets in that smaller models are obtained for better or at least comparable classification errors. Furthermore by providing a set of selected models instead of a single one, we construct a network of possible models for a target prediction accuracy level. Contact: [email protected] ? The first two authors are Joint First Authors. 1 ar X iv :1 51 1. 07 66 2v 1 [ st at .M E ] 2 4 N ov 2 01 5
منابع مشابه
SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملAn optimization technique for vendor selection with quantity discounts using Genetic Algorithm
Vendor selection decisions are complicated by the fact that various conflicting multi-objective factors must be considered in the decision making process. The problem of vendor selection becomes still more compli-cated with the inclusion of incremental discount pricing schedule. Such hard combinatorial problems when solved using meta heuristics produce near optimal solutions. This paper propose...
متن کاملPenalized Bregman Divergence Estimation via Coordinate Descent
Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015